18 research outputs found

    BORG: Block-reORGanization and Self-optimization in Storage Systems

    Get PDF
    This paper presents the design, implementation, and evaluation of BORG, a self-optimizing storage system that performs automatic block reorganization based on the observed I/O workload. BORG is motivated by three characteristics of I/O workloads: non-uniform access frequency distribution, temporal locality, and partial determinism in non-sequential accesses. To achieve its objective, BORG manages a small, dedicated partition on the disk drive, with the goal of servicing a majority of the I/O requests from within this partition with significantly reduced seek and rotational delays. BORG is transparent to the rest of the storage stack, including applications, file system(s), and I/O schedulers, thereby requiring no or minimal modification to storage stack implementations. We evaluated a Linux implementation of BORG using several real-world workloads, including individual user desktop environments, a web-server, a virtual machine monitor, and an SVN server. These experiments comprehensively demonstrate BORG’s effectiveness in improving I/O performance and its incurred resource overhead

    Improving storage performance through layout optimizations

    Get PDF
    Disk drives are the bottleneck in the processing of large amounts of data used in almost all common applications. File systems attempt to reduce this by storing data sequentially on the disk drives, thereby reducing the access latencies. Although this strategy is useful when data is retrieved sequentially, the access patterns in real world workloads is not necessarily sequential and this mismatch results in storage I/O performance degradation. This thesis demonstrates that one way to improve the storage performance is to reorganize data on disk drives in the same way in which it is mostly accessed. We identify two classes of accesses: static, where access patterns do not change over the lifetime of the data and dynamic, where access patterns frequently change over short durations of time, and propose, implement and evaluate layout strategies for each of these. Our strategies are implemented in a way that they can be seamlessly integrated or removed from the system as desired. We evaluate our layout strategies for static policies using tree-structured XML data where accesses to the storage device are mostly of two kinds—parent-to-child or child-to-sibling. Our results show that for a specific class of deep-focused queries, the existing file system layout policy performs better by 5–54X. For the non-deep-focused queries, our native layout mechanism shows an improvement of 3–127X. To improve performance of the dynamic access patterns, we implement a self-optimizing storage system that performs rearranges popular block accesses on a dedicated partition based on the observed workload characteristics. Our evaluation shows an improvement of over 80% in the disk busy times over a range of workloads. These results show that applying the knowledge of data access patterns for allocation decisions can substantially improve the I/O performance

    Efficient native XML storage

    No full text
    XML has emerged as one of the popular data-representation formats for information storage and exchange. XML data today range from representing small files to encapsulating gigabytes of information. Large XML databases must be stored on mass storage devices for both persistence as well as costefficiency. For mass storage of data today, disk drives are the most cost-effective medium. Current approaches of mapping XML data to relational databases or simply using flat files incur a mismatch between the structure of XML data and the underlying storage device (disk drives). In this study, we investigate a new method to store XML data on disk drives that matches the characteristics of XML with those of disk drives. In particular, we present algorithms that, given an XML document and a disk drive, decide how to store the document on the drive, in a way that will later allow efficient execution of XML queries. We evaluate our proposed method using analytical modeling and by simulating the execution of benchmark XPath queries. 1

    Efficient Native Storage for Semi-structured Data

    No full text
    Semi-structured data is becoming commonplace with examples such as XML, Bioinformatics suffix-trees, scientific computing data, and even generic directory-file hierarchies. Such semi-structured data must be stored on mass storage devices for persistence as well as cost-efficiency. Current approaches, which map semi-structured data to relational databases or simply use flat files, incur a mismatch between the structure of the data and the underlying storage device (disk drive). In this paper, we explore alternate native strategies for storing semi-structured data that match its access characteristics to those of disk drives, using XML data as a concrete case study. In particular, we present algorithms that, given semi-structured data and a disk drive, decide how to store the data on the drive in a way that will later allow efficient navigation and retrieval. We evaluate our proposed methods using the DiskSim disk simulator and benchmark XPath queries. The experimental results indicate savings of as much as 7X-34X in query execution time for an important class of navigational queries (which we call non-deep-focused class), compared to the baseline sequential layout of the XML data.

    EXCES:EXternalCachinginEnergySavingStorageSystems

    No full text
    Power consumption within the disk-based storage subsystem forms a substantial portion of the overall energy footprint in commodity systems. Researchers have proposed external caching on a persistent, low-power storage device, which we term external caching device (ECD), to minimize disk activity and conserve energy. While recent simulationbased studies have argued in favor of this approach, the lack of an actual system implementation has precluded answering several key questions about external caching systems. We present the design and implementation of EX-CES, an external caching system that employs prefetching, caching, and buffering of disk data for reducing disk activity. EXCES addresses important questions related to external caching, including the estimation of future data popularity, I/O indirection, continuous reconfiguration of the ECD contents, and data consistency. We evaluated EXCES with both micro- and macro- benchmarks that address idle, I/O intensive, and real-world workloads. Overall system energy savings was found to lie in the modest 2-14 % range, depending on the workload, in somewhat of a contrast to the higher values predicted by earlier studies. Furthermore, while the CPU and memory overheads of EXCES were well within acceptable limits, we found that flash-based external caching can substantially degrade I/O performance. We believe that external caching systems hold promise. Further improvements in ECD technology, both in terms of their power consumption and performance characteristics can help realize the full potential of such systems.
    corecore